Ensemble of online neural networks for non-stationary and imbalanced data streams
نویسندگان
چکیده
Concept drift (non-stationarity) and class imbalance are two important challenges for supervised classifiers. “Concept drift” (or non-stationarity) refers to changes in the underlying function being learnt, and class imbalance is a vast difference between the numbers of instances in different classes of data. Class imbalance is an obstacle for the efficiency of most classifiers. Research on classification of nonstationary and imbalanced data streams, mainly focuses on batch solutions, whereas online methods are more appropriate. Here, we propose an online ensemble of neural network (NN) classifiers. Ensemble models are the most frequent methods used for classifying non-stationary and imbalanced data streams. The main contribution is a two-layer approach for handling class imbalance and non-stationarity. In the first layer, cost-sensitive learning is embedded into the training phase of the NNs, and in the second layer a new method for weighting classifiers of the ensemble is proposed. The proposed method is evaluated on 3 synthetic and 8 real-world datasets. The results show statistically significant improvement compared to online ensemble methods with similar features. & 2013 Elsevier B.V. All rights reserved.
منابع مشابه
Recursive least square perceptron model for non-stationary and imbalanced data stream classification
Classifying non-stationary and imbalanced data streams encompasses two important challenges, namely concept drift and class imbalance. ‘‘Concept drift’’ (or nonstationarity) is changes in the underlying function being learnt, and class imbalance is vast difference between the numbers of instances in different classes of data. Class imbalance is an obstacle for the efficiency of most classifiers...
متن کاملLearning Framework for Non-stationary and Imbalanced Data Stream
Abstract—Although learning on non-stationary data and imbalanced data have been extensively studied in the literature separately, however little work has been done to tackle the imbalanced issue on nonstationary data stream as the joint probability distribution between the data and classes changes with time and may results skewed class distribution. Especially in airlines delay detection, data ...
متن کاملA Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance
Textual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal with the classification of textual stream, especially in imbalanced environment. In this paper, ...
متن کاملEnsemble learning for data stream analysis: A survey
In many applications of information systems learning algorithms have to act in dynamic environments where data are collected in the form of transient data streams. Compared to static data mining, processing streams imposes new computational requirements for algorithms to incrementally process incoming examples while using limited memory and time. Furthermore, due to the non-stationary character...
متن کاملParallel Online Continuous Arcing with a Mixture of Neural Networks
This paper presents a new arcing (boosting) algorithm called POCA, Parallel Online Continuous Arcing. Unlike traditional arcing algorithms (such as Adaboost), which construct an ensemble by adding and training weak learners sequentially on a round-byround basis, training in POCA is performed over an entire ensemble continuously and in parallel. Since members of the ensemble are not frozen after...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Neurocomputing
دوره 122 شماره
صفحات -
تاریخ انتشار 2013